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APPROACHES TO THE VALIDATION OF LEARNING HIERARCHIES 




Lauren B. Resnick and Margaret C. Wang 



Learning Research and Development Center 



University of Pittsburgh 



Several independent lines of investigation over the past decade have been focussing 



on problems of the temporal order in which cognitive behaviors are acquired. Developmental 



psychologists, particularly those exploring the implications of Piaget’s theories of cognitive 



development, have been interested in demonstrating the existence of regular sequences in 



the acquisition of concepts and logical operations. At the same time, test and measurement 



specialists interested in "criterion-referenced testing" have recognized that test batteries 



based on reliably established acquisition sequences might offer a means of economically 



estimating performance on a variety of specific behaviors from a relatively small number of 



test items. Finally, curriculum and instructional designers have been interested in identifying 



optimal sequences for teaching new skills and concepts. Although these three groups have 



rather different goals, their concern with sequence in the acquisition of behavior has given 



them a common interest in the twin problems of generating and validating "behavioral. 



hierarchies" — that is, sets of behaviors which can be shown to be acquired in an invariant 



sequence, implying that later behaviors are dependent upon, or in some sense "built out of" 



earlier ones. 



The developmental psychologist's interest in hierarchies derives largely from a 



concern for verifying the existence of invariant stages in development, through which all 



children pass. Hierarchical "stage" theories of development have been proposed by many 



developmental theorists, of whom the most frequently cited with respect to cognitive develop- 



ment is Piaget (Flavell, 1963; Kohlberg, 1968). Such theories essentially predict the order 
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in which certain behaviors (concepts, intellective and also physical skills) will appear. 

They do not necessarily imply a "maturational" as opposed to "learning, ” or organism- 
environment interaction, theory of how such changes occur (cf. Spiker, 1966). 

Most studies of developmental sequence have employed cross-sectional designs in 
which samples of several ages are tested on a set of behaviors. An empirical sequence can 
then be derived from the percentages of children able to perform the tasks at various ages. 

An example of data from a cross-sectional study appears in Figure 1. The study, by Elkind 
(1961), examined the ages at which conservation of mass weight and volume were acquired. 

Note that the percentage of children conserving mass mounts sharply at age 7; the same rise 
in percentage takes place at age 9 for weight; and not at all (up to the age of 11) for volume. 
These data show a clear order of difficulty among the three tasks and they suggest the hypothesis 
that each individual child acquires conservation of mass first, then weight and finally volume. 

Insert Figure 1 about here. 
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A cross-sectional study, however, cannot directly test the hypothesis that the order 



i 



of acquisition is invariant for each individual; i. e. , that the behaviors are hierarchically 
organized. Longitudinal studies, in which an initial sample of children are re-examined 
over a period of years, would permit the testing of hierarchical sequences. However, longi- 
tudinal studies are extremely difficult and costly to mount. Despite general recognition of 



their value to developmental psychology, relatively few such studies of intellectual develop- 

1 

ment have actually been conducted. 



A few psychologists have seen in scalogram analysis, originally developed by 
Guttman (1944) as a method of scaling responses to attitude questionnaires, a technique that 
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could combine the power of longitudinal studies to examine intianndividual sequence 



contingencies with the speed and lower cost of cross-sectional studies (e. g. Wohlwill, 

1960). These methods have been applied to sequences of behaviors in the areas of haptic 
perception, logical judgements, moral judgements (Peel, 1959), number concepts (Wohlwill, 



1960), and classification skills (Kofsky, 1963). 

Like cross-sectional studies, scalogram studies require the administration of a 



battery of tests presumed to sample behaviors at various points in a linear hierarchy to a 
group of subjects. Although the age of subjects may vary, age itself is not the independent 
variable in scalogram studies. Instead, scores on the test battery are examined for "scalability” 



the extent which the tests can be arranged in an order such that passage of a certain test 

2 

reliably predicts passage of all tests lower in the scale. Figure 2 shows a hypothetical set 



of perfectly scaled data. Subjects are listed down the side, tests across the top. Note that 
once a subject fails a test ("0" indicates failure), he fails all subsequent tests. Similarly, 
if a subject passes a test, he has passed all earlier tests. The existence of such a "perfect" 
scale, or an acceptable approximation to it, is taken to confirm the existence of a behavior 



hierarchy. While the sequence of acquisition is not observed directly, it is inferred from 
the fact that individuals who can perform higher level behaviors show evidence of having 



also learned, or otherwise acquired, all lower level behaviors. The lower level behaviors, 



in other words, appear to be prerequisites for the higher level ones. 



Insert Figure 2 about here. 



Educational test designers have become interested in scalogram analysis primarily 



as a means of constructing test batteries for diagnostic or "placement" purposes (e. g. 



















Cox & Graham, 1966; Ferguson, 1969; Kropp, Stoker & Bashaw, 1966). In such testing, the 
aim is to determine in which specific parts of a curriculum an individual needs instruction 
rather than to assess a general "level” of performance or to compare individuals or groups. 

For this purpose, it is often necessary to test large numbers of specific behavioral objectives. 
This can be an exceedingly complex and time-consuming procedure. The existence of 
empirically validated hierarchies can permit substantial economy in placement testing, since 
subjects who pass a test at the top of a hierarchy can be assumed to be capable of passing 
all lower level tests. Thus, by testing the top objectives in a number of hierarchies, a 
student’s general "entering level" can be quickly assessed. Subjects who fail the top-level 
tests in a given hierarchy can then be tested for the lower level objectives to determine 
specific instruction needs. 

To learning psychologists and curriculum designers, hierarchies represent a means 
of sequencing learning tasks in such a way as to maximize transfer from one task to another 
in order to facilitate the learning of successively more complex behaviors. This means that 
the requirement of predicting passage of tests lower in the hierarchy is subordinated to the 
requirement of generating hierarchies in which training on one task has a predictable effect on 
learning tasks higher in the hierarchy. These two requirements— prediction downward and 
learning facilitation upward — are closely related. However, they are not necessarily completely 
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correlated. It is theoretically not impossible for objectives to scale perfectly, but for 



instruction in a task lower in the scale not to produce significant amounts of transfer to 



higher level objectives. On the other hand, it may be possible to construct highly efficient 



instructional sequences which introduce objectives without having first established all pre- 



requisite behaviors specified in a scale. Researchers interested in the use of hierarchies 



as a means of sequencing instructional objectives, therefore, are necessarily concerned 
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that hierarchy validation studies seek to establish independently the scaling properties 
of hierarchies and their learning transfer properties. The extent to which transfer and 
scaling relationships coincide can then become a matter for empirical investigation. 

Gagne (1962) was the first to formally propose the use of learning hierarchies in 
designing educational programs, although various methods of ’’task analysis, ” leading to 
hierarchy-like structures, had been used in developing industrial and military training 
programs for some time (Miller, 1965). Gagne has outlined a procedure by which behaviors 
can be analyzed by asking the single question, ’’What kind of capability would an individual 
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have to possess to be able to perform this task successfully, were we to give him only 
instructions?” One or more subordinate tasks are specified in response to this question. 

The question is then applied to the subordinate tasks themselves, and so on successively 
down the hierarchy until tasks that can be reasonably assumed in the student population are 
reached. In our own work we have been developing rather more formal methods of generating 
hierarchies (Resnick, 1968; in preparation). Our method is based on an analysis of skilled 
performance that has certain features in common with the technique of ’’protocol analysis” 
developed by Newell (1966) in connection with information processing and computer- 
simulation studies. We also insist on a rigorous specification of stimulus and response in 
our task definitions, which has the effect of keeping each of our tasks more ’’unitary” than 
most of Gagne’s. Operationally, this means that fewer test items would be needed to sample 
each task in our hierarchies than in Gagne’s. 

Insert Figure 3 about here. 

Figure 3 is an example of one of our hypothesized learning hierarchies. Each box 
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defines a task. The entry above the line defines the stimulus situation; the entry below 
the line, the response. The simpler behaviors, according to our analysis, appear at the 
bottom of the chart; the more complex behaviors toward the top. Note that this hierarchy, 
like most of Gagne’s, is non-linear. For example, behavior E is considered prerequisite 
both to G and F, and H is shown as having two prerequisites, C and F. For instructional 
purposes, sequences ABC and DEF could be taught simultaneously, or either one might 
come first; but both would have to be learned before H could be acquired. This branching 
characteristic permits us to recognize within a hierarchical framework much of the variety 
and complexity that characterize learning patterns. For this reason, we believe that hier- 
archies of this kind more accurately reflect psychological reality than do the linear hier- 
archies mainly used by developmental psychologists (e.g. Wohlwill, 1960; Kofsky, 1963) 
and by testers (e. g. Cox & Graham, 1966). However, a branching hierarchy poses certain 
knotty problems in validation methodology. These are the problems to which much of our 
current work in hierarchies is addressed, and to a discussion of which we now turn. 

Our first validation studies were concerned with the ’’scaling” properties of a set 
of hierarchies in the area of early quantification skills. Figure 3 represents one of the 
hierarchies studied. A battery of criterion-referenced tests was developed (Wang, 1968), 
one for each of the objectives included in the hierarchies. The battery of tests was admin- 
istered to a sample of kindergarten children in September, 1968, before any formal instruction 
in the curriculum was given. The results of these tests were then analyzed for scaling 
properties. 

Our first analyses represented an attempt to adapt existing linear scaling procedures 
to the validation of branching hierarchies. For this purpose we used Multiple Scalogram 



Analysis, a procedure developed by Lingoes (1963). This procedure was selected for several 
reasons. First, it can not only validate or refute a hypothesized sequence but can also 
suggest a more optimum sequence or set of sequences. It also provides multi -dimensional 
information about the tests in a given scale. When the data demand it, it can yield multiple 
scales rather than rejecting the scale hypothesis for the set treated as a whole. With respect 
to statistical reliability, MSA contains a measure to control for spuriously high estimates 
of "reproducibility" — Guttman’s classical measure of scalability. This is an important 
feature of the program, since the possibility of inflated reproducibility indices, due to extreme 
pass or fail rates on certain tests in the battery, has been one of the major criticisms of 
Guttman’s method in the past (Loevinger, 1947; Festinger, 1949; Green, 1956; White & Saltz, 
1957; Edwards, 1948, 1957; Lingoes, 1963). Finally, a computer program has been developed 
for MSA — the Format Free Multi -Scaling Program (SCALE); therefore, MSA is an economical 
and convenient procedure to use, especially when dealing with large sets of data. 

Although the MSA program is capable of picking out multiple scales, these scales 
are independent of one another, having no objectives in common. Once an objective is 
selected for inclusion in a scale, it is no longer considered for membership in other scales. 
For example, with respect to Figure 3, if objective H were to scale with C,B, and A it could 
not, in the same analysis, appear in a scale with F, E, and D. In order to apply the program 
to validate a branching hierarchy, therefore, it was necessary to test separately each of 
the linear pathways implied by the hierarchy. For the hierarchy shown in Figure 3 we ran 

five separate analyses: A B C Hg I K; A B C Hg I Jg j D E F Hg I K; 

\ 

t 

D E F H 1 H 2 Ji J 2 ; and DEG. 

The input data for the analyses consisted of a pass or fail score for each subject on 
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each test. The index of the degree to which the objectives are sequenced is operationally 

Sum of Errors 

defined as the reproducibility criterion for Guttman scales: Rep. =l- Total R espon ses * 
Error is defined as a case where a subject passes a higher level objective and fails a lower 
objective. In this study, the criterion of reproducibility was set at . 85. This meant that 
only those tests that could enter a scale with a reproducibility equal to or greater than . 85 
were included in the scale. 

Insert Figure 4 about here. 

The results of these analyses are shown in Figure 4. For each analysis the first 
column shows the hypothesized scale and the second column shows the empirical scale 
generated by MSA. Analysis 1 shows that K and I (counting ordered and unordered arrays 
of objects) had been placed too high in the hypothesized sequence. These counting tasks, 
according to the data, should come before tasks involving numerals (B,C,H 1} H^. The basic 
sequence with respect to learning numerals (A, then B, then C, then H), however, was 
confirmed. Matching numerals (A) appeared as prerequisite to counting, but this may have 
been an artifact of the very high rate of passing test A. Where nearly all subjects in a 
sample can perform a behavior, scaling may show it as prerequisite even to unrelated 
behaviors. Analysis 3 tests the sequence of all counting objectives (D,E,F,I and K) and 
suggests that counting fixed arrays (K and I) comes before counting out a subset from a 
larger set (F). Even counting out a set (F), however, should come before using numerals 
(H^ and Hg), according to this analysis. In combination, Analyses 1 and 3 suggest that our 
initial hierarchy introduced numerals too early in the counting sequence. The implication 
not directly tested in these analyses— is that counting of various kinds must be established 
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before numeral recognition can be learned. Analyses 2 and 4 support this interpretation, 
and also suggest a reordering of the subobjectives in H and J. 

Insert Figure 5 about here. 

On the basis of these analyses, it was possible to construct a new learning hierarchy, 
rearranging the original objectives. This hierarchy is shown in Figure 5. The five objectives 
involving counting of objects (D,E,K,I,F) are now in a linear order, with numeral identi- 
fication (B) appearing as an upward branch from I. Visual matching of numerals (A) is shown 
as prerequisite only to numeral identification and reading (B and C) because, despite its 
apparent relationship to K and I in the empirical scales, it did not seem reasonable to expect 
that learning visual matching of numerals would help in learning to count. H and J sub- 
objectives appear in the new order suggested by the analyses. This order seems quite 
reasonable since both II x and involve counting a set (of objects or events) in response to 
a symbolic presentation, and both H 2 and J 2 involve selecting symbols to match sets. 

Counting claps (G) is retained as a separate branch. As with all post-hoc interpretations, 
of course, it will be necessary to test this reordered hierarchy using new samples of subjects 
before accepting i^s validity. 

Insert Figure 6 about here. 



In this first application, Multiple Scalogram Analysis proved usable, althou^i 
awkward in requiring so many separate analyses. Our next attempt to apply MSA, hov/ever, 
was to reveal more serious complications. Figure 6 shows the results of an attempt to 



test the hierarchical relations between counting skills (Q I and Q II) and two methods of 
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comparing set size, (a) by one-to-one correspondence ( Q VDt B, C, D) and (b) by counting 
each set (Q vn E, F, G). Our hypothesis in this case was a linear one. We predicted that 
children would first learn to count five objects (I D, E, F), then ten objects (II D, E, F); and 
that they would then learn to compare sets, first by one-to-one correspondence ( VH B, C, D) 
and then oy counting (VII E, F, G). The empirical analysis yielded three independent linear 
scales. Scale 1 includes all of the objectives for counting to five, in the predicted order, 



but also suggests that children learn rote counting to ten (II D) before they learn to count 
five objects. One objective for comparing by counting (VII E) falls into this scale. However, 
the objectives for counting objects to ten (II E and F) do not. Instead they appear in Scale 2 
along with comparing by one-to-one correspondence (Vn B and C) and the other comparing 

by counting objective (VII F). One objective (VII D) did not fall into either scale and appears 
by itself as Scale 3. 



There are several difficulties in interpreting these results. Some of the difficulties 
derive from MSA's restriction to independent linear scales. For example, it is unlikely 
that counting objects to ten (II E and F) is truly independent of counting to five (I E and F). 

In MSA, however, the tests could not enter Scale 1 unless they also scaled with objective 
VII E. A possible hierarchy for these objectives is aa upward branch in which counting 



to five leads both to counti ng to t en and to com paring sets: i. e. , 

prfe-i 













However, using MSA, this hypothesis could have been tested only by running two separate 
analyses ID, nD, IE, IF, VIE, VH G; and I D, II D, I E, I F, II E, II F. Similarly, 
comparing via one-to-one correspondence may be prerequisite to comparing via counting, 
although not to simple counting. Here a downward branch can be proposed in which both 
one-to-one correspondence and counting are prerequisite to comparison by counting. 
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Again, however, this hierarchy is not directly testable under the assumptions of MSA. 

Another source of difficulty in interpretation derives from the use of so many 
separate tests for closely related objectives. Possibly, by combining related behaviors we 
might produce more stable measures of the key classes of behavior and thus generate more 
easily interpretable scales. To explore this possibility, we next combined all tests of 
counting to five and gave a single pass or fail score for the set of tests. The same was done 
for the tests -of .counting to ten. Similarly, we computed a single score per subject for all 
tests covering the use of numerals to five and another for the numerals to ten. Finally, 
tests for comparing sets were combined to yield one score for the counting method and one 
score for the one-to-one correspondence method. These six summary scores were then 
analyzed using Multiple Scalogram Analysis. The results appear in Figure 7. 

Insert Figure 7 about here. 
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In this analysis, all of the objectives involving counting fall into a single, quite 



easily interpreted scale. According to this scale, skill in counting objects is acquired 



before the numerals are learned (I before II, and III before IV), but both counting and 



numerals to five are learned before the child learns to count to ten. Comparison of sets 



by counting is acquired only after basic counting and numeration are established. Comparison 
of sets by one-to-one correspondence (V) appears in this analysis as an independent class 
of behaviors, neither dependent upon nor prerequisite to counting and numeration skills. 



This finding seems reasonable with respect to simple counting and numeration skills 



(Objectives I - IV). However, it seems unlikely that the two comparison skills (Objectives 
V and VI) are completely unrelated to each other. In the MSA program, once Objective VI 
was shown to scale with Objectives I through IV it could not be considered for membership 



in a scale with Objective V. Although a separate program run for Objectives V and VI 



alone would have been technically possible, the assumptions of Guttman scaling procedure 



make the testing of two-item scales a questionable procedure. Thus, there was no acceptable 
means s within the "scalogram" framework, of testing the hypothesis of a conjunctive branch 
in which both counting and numeration to 10 (Objectives m and IV) and comparison of sets 



by one-to-one correspondence (Objective V) are prerequisite to comparison by counting 




(Objective VI). 

The repeated awkwardness of Guttman scaling procedures in dealing with branching 



hierarchies led us to search for an alternative validation method whose assumptions would 



more closely match those of our hierarchical theory. Our requirements were the following: 

1. Our hierarchies ar e generated one level at a time, by first identifying components 
of the terminal behavior, next identifying prerequisites of these components, then prerequisites 
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of the prerequisites, and so on in a succession of individual "analyses. " This means that the 
critical relationships in a hierarchy are those between vertically adjacent items, (e. g., in 
Figure 3, between F and H, E and F, C and H, E and G, etc. ) rather than across an 
entire scale. Thus, it was appropriate to seek a method of validation that tested these 
adjacent relationships directly and did not immediately seek to construct multi-test scales 

or summary statistics covering an entire hierarchy. 

2. The validation method should provide a means of testing several kinds of branches. 
These include (a) upward branches, in which a single objective is prerequisite to two or 
more higher level objectives (e. g. , in Figure 3, E is prerequisite to both F and G), (b) 
downward conjunctive branches in which several objectives are jointly prerequisite to a 
single higher level one (e. g. , in Figure 3, F and C must both be learned before H can be 
learned); (c) downward disjunctive branches in which either of several objectives is a 
prerequisite to a higher level one. Figure 8 shows a downward disjunctive branch. The 
hierarchy hypothesizes that in order to compare the number of objects in two rows (C) the 
child can either count the sets (A) or use a method of one-to-one correspondence (B). He 
need not, however, be able to perform both A and B. 



C 



Insert Figure 8 about here. 



3. The method selected should, ideally, permit a process of "search" among 
objectives for hierarchical relationships not previously hypothesized. These would in effect 
provide hypotheses for subsequent studies. While this is not a theoretical requirement, the 
possibility of such searches would be a valuable tool during the early stages of research 
in a new area. This capability will of course require a computerized analysis capable of 
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handling large quantities of data and of considering many alternative relationships. 



Other investigators have used procedures that met the first of these requirements. 



Gagne's various hierarchy studies (Gagne, 1962; Gagne et. al. , 1962; Gagne & Paradise, 



1961) used pass and fail contingencies for adjacent objectives in a hierarchy to compute a 



"proportion of positive transfer" statistic — essentially the inverse of the percentage of 



cases in which an individual passes a higher level test while failing the lower level "pre- 



requisite. " Walbesser's (1968) proposed method for validating the AAAS science curric- 



ulum also uses pass-fail contingencies to test the "dependency" of each individual objective 



on its immediate prerequisites. Both Gagn& and Walbesser directly test downward con- 



junctive hypotheses, by combining data for two or more prerequisite tests and assigning a 



"pass" score only if all tests are passed. Upward branches are not tested directly, but 



are in effect implied when each of two higher-level objectives is shown to have the same 



lower -level objective as its prerequisite. However, neither Gagne nor Walbesser has 



discussed methods of testing downward disjunctive branches. Finally, neither of these 



methods is appropriate for empirical construction of hierarchies from test data, as opposed 
to validation of deductively analyzed hierarchies. 



Dr. John Carroll of ETS in Princeton has developed a hierarchy validation procedure 



that meets the requirements outlined in paragraphs 1. and 2. , and which will also be, once 



a computer program is completed, quite economical to apply to large quantities of data, 



thus permitting empirical search for hierarchical relationships (Carroll, 1969). Carroll's 



method, like those of Gagne and Walbesser, begins with the construction of pass-fail 

4 

contingency tables for all possible pairs of items in the hierarchy. Phi/Phimax coefficients 



are then computed for each table. When the coefficient reaches an acceptable level a 
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hierarchical relationship between the two items is inferred, with the test showing the higher 
pass rate considered prerequisite to the one with the lower pass rate. On the basis of 
these simple prerequisite relationships, it is possible to construct a hierarchy which can 
have both linear and branching sections. 

Insert Figure 9 about here. 



Figure 9 shows a hierarchy derived from applying Carroll's program to the data 
analyzed in Figure 6. The hierarchy contains both upward branches and downward con- 
junctive branches. Each of these types of branches can be logically derived from the simple 

5 

prerequisite relationships. Downward disjunctive branches, however, must be tested 
directly. The Carroll program will do this by combining two tests and giving them a pass 
score if either of the two tests was passed. Phi/phimax coefficients will then be computed 
for these new scores. Since the computer program for disjunctive contingencies has not 
yet been completed, and hand calculation is extremely tedious, we have not yet applied this 
analysis to our data. However, we believe that the study of alternate routes to learning 
objectives — the essence of the disjunctive hypothesis — may be one important means of 
accounting for individual differences within a hierarchical framework. 

The hierarchy in Figure 9 shows many branches, with very short linear paths. 

It is in some respects easier to interpret than the scales shown in Figure 6. Essentially, 
the hierarchy breaks up Scale 1 of Figure 6, showing rote counting to ten (II D) as not 
prerequisite to counting objects to five (T E and F), but as dependent upon rote counting 
to five (I D). This is precisely what would be expected from a behavioral and logical 
analysis of counting skills. On the other hand, the hierarchy also shows the five tests 
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of Scale 2 as being unrelated to one another. This result is not so easy to interpret; 




behavioral analyses would have predicted that VII C would remain dependent upon VH B, 



and n F on II E. Further testing using new subject samples and, where necessary, revised 



tests, will be needed both to clarify the substantive issues raised here and to further explore 



the characteristics of Carroll's validation method. 



In the research discussed up to this point, attention has focused exclusively on the 



possibility of predicting lower level behaviors from performance on higher level ones. No 



attempt has been made in these studies to directly study the effects of learning lower level, 



presumably prerequisite, skills on the learning of higher level behaviors. To study these 



transfer effects, experiments involving instruction in the elements of the hierarchy are 



required. Such experiments, by directly inducing acquisition of certain behaviors, permit 



more direct tests of transfer hypotheses. 



Gagne (1962) reported an exploratory study in which ability to perform a terminal 



task, given verbal directions only and no "practice, " was measured before and after 



completion of a hierarchically arranged teaching program which stopped short of the terminal 



objective. This study in effect measured transfer to the terminal task from all of the 



subordinate learning sets combined. Other studies by Gagne (Gagne et. al. , 1962; Gagn6 & 



Paradise, 1961), as well as a more recent study by Ford & Meyer (1966), use a combination 



of instruction and scale analysis to test transfer among the subordinate sets themselves. 



In each of these studies subjects worked through a teaching program designed to 



teach each of the behaviors in the hierarchy. Although the programs were designed to teach 



with a minimum of errors, demonstrated mastery of one unit was not required in order 



to move to the next unit. Thus it was possible to "complete" the program without mastering 
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all of the behaviors taught. Upon completion of the program subjects were tested on 



mastery of each separate behavior in the hierarchy. The data were examined to determine 



the percentage of subjects able to perform each behavior who were not also able to perform 



the predicted prerequisites for that behavior— that is for scaling "errors. " A low rate 



of such errors indicated that mastery of the "prerequisite" was needed in order to profit 



from direct instruction in the higher-level objective and thus confirmed the hierarchical 



hypotheses. 



A study by Merrill (1965) introduced a mastery criterion into the teaching program 



itself as a means of testing the transfer characteristics of a hierarchy. Some subjects were 



given correction and review on successive tasks within a program until they reached a 



criterion of mastery; other subjects continued through the program regardless of mastery 



of the successive tasks. Merrill assumed, in accord with hierarchical theory, that mastery 



of lower level tasks would produce faster, more accurate learning and better retention of 



higher level tasks. He thus predicted that the correction and review group would go through 



the program more quickly and would perform better on immediate and delayed post-tests 



than the other group. These predictions were not borne out, and Merrill concluded that 



mastery of tasks lower in a hierarchy is not essential to learning a higher level task. It 



should be pointed out, however, that the hierarchy on which Merrill's teaching program 



was based had not been independently validated. Thus, Merrill's results may simply mean 



that the particular hierarchy studied is invalid rather than that hierarchically ordered 



sequences in general do not produce positive transfer. 



All of the studies just described have attempted to study transfer properties of 



an entire hierarchy, and each has used a fairly extensive teaching program as its instructional 
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vehicle. An alternative strategy is to study transfer relationships between adjacent pairs 
of behaviors in a hierarchy or among short sequences of behaviors. This strategy, while 
requiring many more separate studies than the total hierarchy approach, permits much 
tighter experimental design. In addition, as Gagn£ (1968) has pointed out, it puts hierarchy 
research in contact with a past body of psychological research in transfer variables. A 
number of experimental designs for such small-scale transfer studies are possible. 

One such design is to teach several behaviors in each of several different orders 
to different groups of subjects and to take repeated measurements of achievement of all 
behaviors during the course of instruction. Uprichard (1969) used this approach in studying 
various sequences of instruction for the basic mathematical concepts of ’’greater than” (G) 
’’less than” (L) and ’’equivalent to" (E). Six groups of nursery school children received 
small group instruction in these three concepts, each group learning the concepts in a 
different sequence. A test covering all three concepts was administered at the end of each 
week of instruction. When three out of the four subjects in a group reached criterion on 
the concept being taught , the entire group moved on to the next concept in its sequence. The 
week -by-week test scores on each concept for each of the groups provided the basic data 
in this study. Only the groups who were taught E first reached criterion on a concept in 
the first week of instruction. The groups beginning with G and L reached criterion on E in 
the third or fourth week of instruction, without ever being taught the concept directly. The 
groups beginning with L learned only E in four weeks of instruction and had not learned L 
when the experiment ended. Thus, the data make it clear that E is the easiest to learn of 
the three concepts and L the hardest. The group taught in the order E-G-L was first to reach 
criterion on all three concepts (in the fourth week), thus suggesting that this is the optimal 
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order for teaching the three concepts. However, the data is not absolutely clear in this 
respect, since the G-E-L group reached criterion on both G and E in the third week, at 
a time when the E-G-L group had still acquired only E. 

A more sensitive measure of learning is available when subjects are run individually; 
trials to criterion or error rates on each task in the learning situation itself can then be 
used as the independent variable. Assume that two behaviors are taught in two orders, A-B 
and B-A, to two groups of subjects. According to hierarchical theory, if B is dependent 
on A then trials to criterion for task B in the order A-B should be significantly lower than for 
the same task in order B-A. An additional implication is that in order B-A, A should be 
’’learned” virtually without error in the formal presentation, since the subject must some- 
how have learned A on his own in order to have acquired B. Finally, the total number of 
trials for tasks A and B combined should be lower in A-B than in B-A order, since the 
former would be a more efficient order in which to teach the set of tasks. 

A recently completed experiment (Resnick, Siegel and Kresh, in preparation) used this 
design in a study of double-classification skills in young children. Two tasks were used. 

Both required the child to correctly place objects in the cells of a matrix. In task A the 
defining attribute for each row and column was ’’given” to the child in the form of a filled 
’’attribute” or ’’edge” cell. In task B, there were no attribute cells and the subject had to infer 
the defining attribute from filled interior cells in the matrix. A typical matrix for each task 
appears in Figure 10. We hypothesized that task B was dependent upon task A. In accord 
with the predictions just outlined, our results showed significantly more trials to criterion 
for task B when it came first than when it was preceded by task A. In addition, the predicted 
’’immediate” learning of task A in second place did occur for subjects who had succeeded in 
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learning B. However, the number of trials to criterion for the two tasks combined was 
not significantly different for the two orders. . 

Insert Figure 10 about here. 

Members of our staff are now designing several other transfer experiments, which 
will be run over the next several months. We view such studies as a means not only of 
ordering specific behaviors, but also of exploring the relations between hierarchical 
sequences and actual teaching procedure. For example, we intend to explore the conditions 
under which practice on a terminal behavior may be more efficient than learning a hierarchical 
set of subordinate behaviors. We will also want to ask, as we have begun in the study just 
reported, what effect practice on the terminal behavior has on learning subordinate behaviors. 
Eventually, as the parameters of transfer in learning hierarchies become clearer, we hope 
it will be possible to define individual differences in learning as a function of the ways in 
which hierarchical structures are acquired. Some individuals, for example, maybe able 
to skip over certain behaviors in a hierarchy while others may need explicit instruction at 
every step. Similarly, some may need extensive practice, to the point of "overlearning, ” 
before a newly learned behavior facilitates learning of a higher level objective, while others 

may show transfer effects from brief exposure. 

With respect to applied work in curriculum design and evaluation, our work will 

continue to be concerned with defining and sharpening the role of hierarchical analysis, and 
in particular with determining the extent to which scalability of tests accurately predicts 
transfer relations among the behaviors. To explore this question, it will be necessary to 
conduct both psychometric studies, in which batteries of tests are administered and examined 
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for hierarchical relationships, and experimental training studies, in which the behaviors 
in question are taught and transfer effects evaluated. By conducting both types of studies 
on each major hierarchy investigated, we expect to be able to examine empirically the 
extent to which scaling properties of hierarchies have direct implications for teaching 
sequences. We will also be able to explore the extent to which varying teaching sequences 
can produce differing scale structures. As these relationships become clearer, behavior 
analysis and learning hierarchies can be expected to become increasingly more valuable 
tools in educational research and development. 
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Footnotes 



One example of a longitudinal study of intellectual development is Piaget’s study of 
his own three children reported in "The Origins of X ntelligence in Children" (1952). 

The term "test" is used here and throughout this paper to denote a collection of indiv- 
idual items which are presumed to measure the same behavior and for which a 
single "pass" or "fail" score can be assigned. Thus, "tests" are treated in this 
research the way "items" were treated in Guttman's original work. An "objective", 
as used here, is a description of the behavior sampled in a test. It represents an 
intended outcome of instruction. 

A "Criterion-referenced test" is an achievement test developed to assess the presence 
or absence of a specific criterion behavior described in an instructional objective. 
Such a test provides information about the competence of a student that is independent 
of the performance of other students. For further discussion of criterion-referenced 
tests, see Glaser, 1963. 

"Phi" is essentially an estimate of the correlation between two tests, each scored 
dichotomously. Phimax is an estimate of the highest-possible phi coefficient given 
the marginals of the contingency table. Since phimax would become larger as the 
pass or fail rate of either test became more extreme, the use of phimax in the 
denominator controls against artificial inflation of the association due to extreme 
pass or fail rates. 

Direct testing of downward conjunctive branches is not logically necessary. If a test 
is independently dependent on each of two other tests, then it cannot logically be 
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passed unless each of its prerequisites is passed. Nevertheless, Carroll is 
planning to include an empirical check on this deduction by combining two or more 
tests to yield a single pass or fail score and then computing phi/phimax coefficients 



for the combined scores. 
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Cross Sectional Study Analysis 

Per Cent of Conservation Responses for 
Mass, Weight, and Volume 
at Successive Age Levels (N=25 at Each Age Level)* 



Type of 
quantity 








Age level 








5 


6 


7~ 


8 


9 


10 


11 


Mass 


19 


51 


70 


' 72 


86 


94 


92 


Weight 


21 


52 


51 


44 


73 


89 


78 


Volume 


0 


4 


0 


4 


4 


19 


25 



* From Elkind, 1961, Table 1 
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Figure 2 



TEST 




A Perfect Guttman Scale 










A Comparison of the Hypothesized and the Empirical 
Scales for Quantification I 



Figure f) 




Reordered Hierarchy for Quantification I 



Figure 6 




Hypothesized Scale 



Empirical Scale 



Scale 1 Scale 2 Scale 3 



Q I D (Rote count 0-5) 

E (Count moveable objects 0-5) 
F (Count out a set 0-5) 



I D 



VH B 



VII 



II D 



II E 



I E 



VH F 



Q II D (Rote count 6-10) 



I F 



VII C 



E (Count moveable objects 6-10) VH E 
F (Count out a set 6-10) 

Q VH B (Pair sets — equal, unequal) 

C (Pair sets — more, less) 

D (Pair sets — most, least) 

E (Count sets — equal, unequal) 

F (Count sets — more, less) 

*G (Count sets — most, least) 

W eprouuuauiii iy 

* Eliminated from consideration because all S’s failed. 




Comparison of Hypothesized and Empirical Scale for 
Couhting Objects and Comparison of Sets 

(N=37) 







Figure 7 
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Hypothesized Scale 


Empirical Scale 






Scale I Scale II 


Objective I 


(Counting objects 0-5) 


I v 


Objective II 


(Using numeral representation 0-5) 


H 


Objective III 


(Counting objects 6-10) 


HI 


Objective IV 


(Using numeral representation 6-10) 


IV 


Objective V 


(Comparison of set size by one to 
one correspondence) 


VI 


Objective VI 


(Comparison of set size by counting) 





Reproducibility 



.957 



1. 000 



Comparison of the Hypothesized and Empirical Scales 
Basic Number Concept Units 
(N=37) 



Figure 
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